MASSACHUSETTS INSTITUTE OF TECHNOLOGY 
ARTIFICIAL INTELLIGENCE LABORATORY 


A.I. Memo No. 1060 


July 1988 


The Alignment of Objects with Smooth Surfaces 


Shimon Ullman* and Ronen Basri* 


Abstract: This paper examines the recognition of rigid objects bounded by 

smooth surfaces, using an alignment approach. The projected image of such an object 
changes during rotation in a manner that is generally difficult to predict. An approach 
to this problem is suggested, using the 3-D surface curvature at the points along the 
silhouette. The curvature information requires a single number for each point along 
the object’s silhouette, the magnitude of the curvature vector at the point. We have 
implemented this method, and tested it on images of complex 3-D objects. Models of 
the viewed objects were acquired using three images of each object. The implemented 
scheme was found to give accurate predictions of the objects’ appearances for large 
transformations. Using this method, a small number of (viewer-centered) models can 
be used to predict the new appearance of an object from any given viewpoint. 
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1 Introduction 


Visual object recognition requires the identification of objects observed from different 
viewpoints. In recent years several attempts have been made to approach this problem 
using an alignment approach [5, 7, 9, 12, 17, 18]. In this paper we shall consider the 
recognition of rigid objects bounded by smooth surfaces, using an alignment approach. 

Alignment is a two stage process. Given a model object and an image object, in 
the first stage a transformation is sought, that would bring the model object to a 
position and orientation in space that corresponds to the projected image. Second, 
the appearance of the model following the transformation is predicted. The result is 
compared with the actual image, and the degree of match is used to decide whether 
the image is in fact an instance of the model. 

The first stage of the alignment process, namely the search for an aligning transfor¬ 
mation, will not be discussed here. The transformation may be determined by a small 
set of corresponding features, identified in both the model and the image. For example, 
three non-colinear points on the image, and their corresponding points on the model 
determine uniquely the transformation [8, 9, 12, 18]. Two points and a line or three 
lines may also serve for this purpose. 


1.1 The Prediction Problem 

In this paper we address ourselves to the second stage of the alignment process. We 
present an approach for solving the following problem: Let M = {Mi,M 2 , ...M n } be 
a set of object models. Let T be a set of transfomations, that include rotations in 
3-D space, translations and scale changes, followed by an orthographic projection. This 
projection model assumes that the object is not too close to the camera. Given a model 
M € M and an aligning transformation T € T, predict the appearance in the image of 
M following the application of T. 

The above definition of the set T of allowed transformations enables the prediction 
of the appearance of rigid objects from any given viewpoint. The general prediction 
problem extends the set T with other types of transformations, including for example 
articulated motion and distortion. This extension is beyond the scope of this paper. 

In order to utilize edge maps in the image, we make the following definitions: Given 
an object O and a viewpoint v, the rim is the set of all the points on the object’s surface, 
whose normal is perpendicular to the visual axis [11]. This set is also called the contour 
generator [13]. A silhouette is an image generated by the orthographic projection of the 
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rim. In the analysis below we assume that every point along the silhouette is generated 
by a single rim point, and that the object lies at the neighbourhood of a rim point only 
on one side of the tangential line of sight. 

The prediction of the appearance of smooth objects is not at all straightforward. 
An edge map of an object usually contains the silhouette, which is generated by its rim. 
A rim that is generated by a sharp edge is stable on the object as long as the edge is 
visible. In contrast, a rim that is generated by a smooth surface changes continuously 
with the viewpoint. 

The problem of predicting the new appearance of a smooth object following a ro¬ 
tation is illustrated in Figure 1. The figure shows a bird’s eye view of two rotating 
objects, a cube (a & b) and an ellipsoid (c & d). For both objects points p, q lie on the 
object’s rim, and therefore their projections lie in the image on its silhouette. When the 
cube rotates from position (a) to (b), p, q remain on the rim. Their new 3-D position 
is easily determined, therefore the new silhouette can be predicted in a straightforward 
mannar. In contrast, when the ellipsoid rotates from position (c) to (d), the new 3-D 
position of p, q is no longer relevant since these points no longer lie on the object’s 
rim. The silhouette is now generated by a new set of points p', q' in (d). Figures 1 e 
&; f show the ellipsoid in a Defense view before and after the rotation, compared to its 
appearance if the rim, as a space curve, had been rotated by the same amount. The 
conclusion is that the prediction problem for smooth objects is in general significantly 
more complicated than that of objects with sharp edges. 


1.2 Previous Approaches 

Two approaches have been suggested in the past to solve this prediction probl em 
The first approach approximates the object’s surface by a set of 3-D wires [l]. The 
transformation T is applied to each wire. The extreme wires are taken to be the object 
silhouette. The second approach describes an object as a composition of either surface 
or volumetric primitives, that have simple geometrical structures [2, 3, 6, 7, 14, 16]. 
The transformation T is applied to each primitive. Since the primitives have simple 
geometrical structures, their silhouette can be easily predicted. The extreme points of 
the collection of the primitives silhouettes are taken to be the object’s silhouette. 

The decomposition approach works well for simple objects, but usually not for 
complex ones. The wire approach is often costly from a computational standpoint due to 
the large number of wires required, and the need to perform “hidden line elimination”. 
Finally, these approaches usually enable the prediction of the bounding contours only. 
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Figure 1: Changes in the rim during rotation, (a) A bird’s eye view of a cube, (b) The cube 
after rotation. In both (a) and (b) points p, q lie on the rim. (c) A bird’s eye view of an ellipsoid, 
(d) The ellipsoid after rotation. The rim points p, q in (c) are replaced by p’, q’ in (d). (e) An 
ellipsoid in a frontal view, (f) The ellipsoid rotated (outer), superimposed on the appearance 
of the rim, as a space curve after rotation by the same amount (inner). 
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Internal contours and surface markings, that may have a significant role in shape based 
recognition, are often not treated. 

This paper presents an alternative approach for the prediction problem. In this 
approach an object is represented by its silhouette, as seen from a particular viewpoint. 
Using the 3-D surface curvature of each point along the silhouette, it is possible to make 
an accurate estimation of the silhouette after large transformations. A few models of 
this kind are sufficient for predicting the object’s appearance from any given viewpoint. 


2 The Curvature Method 

The method is based on representing surface curvature of points along the silhouette. 
The basic idea is shown in Figure 2. Let X and Y be the main axes of the image plane, 
and Z -axis be the line of sight. Consider an object O rotating by a rotation R around 
the vertical axis Y. Let p be a point on its rim. The figure shows a section of the 
object through p, that is perpendicular to Y. Let r x be the curvature radius of p in 
this section, and let r x be a vector of length r x parallel to the X-axis. When the object 
rotates by R, point p ceases to be a rim point, and it is replaced by a new point p' 
approximated by: 

p' » R (p - r x ) + r x (1) 

The meaning of eq. (1) is the following. The point o = p — r x is the center of 
the circle of curvature of p. To predict the new rim point we first apply R to o. Let 
o' = R (p — r x ). The new rim point is then p' = o' + r x . This approximation holds as 
long as the circle of curvature provides a good approximation to the section at p. 

It is worth noting that “sharp” boundaries, such as the cube edges in Fig. 1, or 
markings on the surface itself, do not require a special treatment. They are included in 
eq. (1) as the special case r = 0. 

So far we have considered rotations around the vertical Y axis. We shall next con¬ 
sider the effect on the silhouette of a rotation around an arbitrary axis in space. Any 
3-D rotation can be decomposed into two successive rotations: a first rotation around 
some axis V in the image plane, followed by a second rotation around the Z- axis. The 
effect of rotating the object around the line of sight Z is, of course, easy to predict. 
The problem, therefore, is to predict the appearance of the object following a rotation 
about an axis V lying in the image plane. 

In general, the vector of curvature radius r x used in eq. (l) would depend on the 
rotation axis. Let r x ,r y be the radii of curvature at p for rotations about the Y and X 
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V 


V 


(a) (b) 

Figure 2: The curvature method, (a) A horizontal section of an ellipsoid, p is a point on the 
rim, r is the radius of curvature at p, o is the center of the curvature circle, and a is the 
intersection of the V-axis with this section, (b) The section rotated, p 1 is the new rim point, 
and it is approximated by eq. (1). 
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axes respectively. Proposition 1 below states that the radius of curvature for a rotation 
around any axis can be determined from r x ,r y alone. 

Proposition 1: Let p be a rim point, and let V a be an axis lying within the image 
plane and forming an angle a with the X-axis. The curvature radius at p for rotations 
around V a is given by: 

r a = r„ cos a —r x sin a (2) 

(A proof is given in the appendix). 

From this proposition and eq. (l) we can predict the position of p', the new rim 
point, for a rotation around an arbitrary axis within the image plane, and consequently 
any 3-D axis as well, using the two parameters r z ,r„ at p. Proposition 2 below shows 
that, in fact, a single parameter suffices. 

Proposition 2: Let r = (r s , r v ) be the curvature vector at p, and let t be the tangent 
vector to the silhouette at p. Then r • t = 0 that is, r is perpendicular to t (A proof is 
given in the appendix). 

It follows from the two propositions above that a single number is sufficient to 
determine the radius of curvature for a rotation around any axis in the image plane. 
This number is the magnitude of the curvature vector, ||r||. All other parameters can 
be derived from it as follows. Let 0 be the angle between the tangent vector to the 
silhouette t and the X-axis, then: 


r* = ||r|| sin 6 

r y = ||r|| cos# (3) 

r a = ||r|| cos (6 + a) 

The scheme is therefore the following. An object model M is a 2-D (orthographic) 
projection of its visible contours (including its sharp and smooth boundaries, as well 
as internal markings), as observed from a particular viewing direction. To represent 
the entire object, a number of views would be required [10]. As shown in the examples 
below, this number is usually small. Each point along the silhouette has associated 
with it, along with its spatial coordinates, the magnitude ||r|| of its curvature vector 
in eq. (3). Given a transformation T, translation, scaling and rotation around the line 
of sight are applied to M in a straightforward manner. The effect of rotation around 
an arbitrary axis in the image plane is computed as follows: first, for each point on the 
model, the radius of curvature r with respect to the rotation axis is determined using 
eq. (3). Once r is known, the new position of the point in the image is determined 
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using eq. (1), where instead of r x a vector of size r perpendicular to the rotation axis 
is plugged in. 

In this approach an object is represented using a number of viewer-centered de¬ 
scriptions, rather than a single object-centered representation. Each description covers 
a range of possible viewing angles, and to represent the entire object a number of de¬ 
scriptions are required. This number depends on the object shape and on the complexity 
of its aspect graph [10]. As shown in the examples below, this number is expected to be 
small. The cars in these examples required 4 models to cover all common views, which 
included all vertical rotations and elevation of i30°. because of the orthographic pro¬ 
jection approximation, if the object is to be recognized from both large distances as well 
as close-up views, additional models will be required. The computations required in this 
scheme during the prediction stage are simple, for example, no hidden-line elimination 
is necessary. 

It is of interest to note that the use of multiple viewer-centered descriptions appears 
to be consistent with a number of recent psychophysical and physiological investiga¬ 
tions. For example, Perret et al [15] have investigated the response properties of face- 
sensitive cells in area STS of the macaque’s visual cortex. They have found that cells 
typically respond to a wide range of 3-D orientations, but not to all viewing directions. 
A face-selective cell that responds to e.g. a face-on view will typically not respond to 
a profile view, but will respond to a wide range of intermediate orientations. Perret 
et al conclude that “High level viewer-centered descriptions are an important stage in 
the analysis of faces” [p. 314]. The use of viewer-centered rather than object-centered 
descriptions [14], each covering a wide range of orientations, is consistent with the 
representation scheme described above. 


3 Model Construction 

We have implemented the alignment scheme described above, and tested it on images 
of 3-D objects. To apply the scheme, models of the viewed objects must be acquired. 
For our purpose, an object model must contain the spatial coordinates and the radii of 
curvature of the object’s visible contours. The required 3-D information can be obtained 
during a learning period using various 3-D cues, such as stereo information and shading. 

To estimate the radii of curvature we have used three edge pictures of each object. 
The radii of curvature are estimated using the following procedure. Consider three 
silhouette pictures (denoted by A, B and C) that are taken from three different v w- 
points along a circle in space perpendicular the y-axis (Figure 3). Suppose a is the 


7 



rotation angle between pictures A and B, and (3 is the rotation angle between pictures 
A and C. Let pi, p 2 and p 3 be three corresponding points in A, B and C respec¬ 
tively. Since the camera was rotated around the X-axis between successive pictures, all 
three points share the same y coordinate, so that pi = (xi,y, z x ), p 2 = (x 2 ,y, 2 2 ) and 
Ps = ( X 3,V, 23). According to eq. ( 1 ): 


x 2 = (xi - r x ) cos a + z x sin a + r x ( 4 ) 

x z = (®i - r x ) cos /? + z x sin /? + r x (5) 

These are two linear equations of two unknown parameters z x and r x . Hence: 


z x = 


xi(cosa — cos/?) — £ 2(1 — cos/?) + x 3 (l — cos a) 
(1 — cos a) sin /? — sin a(l — cos /?) 


( 6 ) 


_ Xisin(a — /?) + x 2 sin/? — x 3 sina: 

* (1 — cos a) sin/? — sina(l — cos/?) ' 

In the range —j < a, j3 < a ^ (3, the denominator does not vanish. 

Using a similar technique we can calculate also r v . For most of the points an in¬ 
dependent estimation of r„ is in principle redundant (see eq. (3)), but it can be used 
to improve the estimate of the curvature. In this manner five pictures can be used to 
create a model, three for a rotation around X-axis, and three for a rotation around 
X-axis, with the central picture common to both sets. The final model consists of an 
edge map of the central picture, together with the depth coordinates and the estimated 
magnitude of the curvature vector ( r x , r v ) at each point. 


Note that identifying corresponding points in the pictures is straightforward in this 
procedure. When the rotation is about the X-axis, the corresponding points must lie 
on the same horizontal line. Each contour point therefore usually has a small number 
of candidate corresponding points to be considered. Details of this matching procedure 
will not be discussed here. 


By setting (3 = — a the equations above can be simplified as follows: 


x 2 - x 3 

Z\ — - \ - 

2 sina 

x 2 + x 3 — 2xi cos a 
2(1 — cos a) 


( 8 ) 

(9) 


If the angle a is not known, but assuming that a is small we can define new quantities 
z, r x as follows: 

^ ( 10 ) 


z — za 
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r x = {r x - xi)a 2 « x 2 + x 3 - 2xi (11) 

This approximation uses sin a a and cos a —► 1 — ^i. In this case the aligning 
transformation should provide, instead of a rotation angle 0, the ratio 

x' « x + z0 + {r z - x) 6 — = x + z(-) + ^(-) 2 (12) 

Z a Z a 

The ratio £ can be determined during the alignment process if we take these approxi¬ 
mations into account. Suppose for instance the alignment transformation is determined 
by a three points correspondence. In this case a set of six equations describing rotation 
in 3-D space, translation and scale must be solved [9, 18]. If the three points lie on the 
object’s contour, we can substitute two of the six equations, those describing rotations 
around the X and Y axes, by eq. (12). Consequently we obtain a new set of six equa¬ 
tions with six unknown parameters to solve. This set will usually have a small number 
of solutions, but the details will not be considered here further. The range of rotations 
covered by a single model would depend on the object’s shape and on the similarity to 
other models. The results shown in section 4 also hold for this approximation. 


4 Implementation 

A prototype system for object recognition using alignment, that predicts the appearance 
of objects using the curvature method was implemented on a Symbolics 3670 machine. 
Pictures comprising of 512x512 pixels were taken, using a vidicon Cohu camera. Edge 
maps of the pictures were created using the Canny edge-detector [4]. The system first 
constructs object models comprising of depth values and curvature radii as described 
in Section 3. Models can be constructed in this system using either three images using 
rotations around the F-axis, or five images using rotations around both the Y and 
X axes. The internal model can then be used to predict the appearance of the model 
following 3-D rotation, translation and scaling, using the curvature method described 
in Section 2. 

Two models of similar cars, a VW and a Saab, were created (Figure 4). For each 
model three pictures were taken, with a and /? (the angles between successive pictures, 
see Section 3) being ±30° around the F-axis. For each car, the procedure resulted in 
a single model, comprising of the edge map of the central image, together with the 
approximated depth and curvature along the edges. It was found that a single model of 
this type yields accurate predictions to the appearance of the object within the entire 
60° of rotation around the F-axis. 
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Figure 5 shows four pictures, two of each car, rotated by ±15°. Such rotations 
already create large deformations of the images (Figure 6). Figure 7 shows the results of 
aligning the models with the images. An approximation to the transformation (rotation, 
translation and scale) can be supplied by different alignment routines, e.g., using three 
corresponding points [18]. It can be seen that, by using the alignment procedure, a single 
model gives accurate fits to the object seen from different viewing positions. Figure 8 
shows the result of matching the two cars with the incorrect models. The discrepancy 
between the image and the aligned model is significantly higher than in Figure 7. A 
simple distance metric between the image contours and the aligned model was therefore 
sufficient to select the correct model. It is worth noting that accurate predictions were 
obtained despite the fact that (1) the objects have complex 3-D shapes, and (2) we 
have used crude approximations to the radii of curvature using three pictures. 


5 Summary- 

in this paper we have proposed a method for predicting the new appearance of an object 
with smooth surface following a similarity transformation (3-D rotation, translation and 
scaling). The method uses the 3-D surface curvature along the object contours. We have 
shown that a single parameter, the magnitude of the curvature vectors at these points, 
is sufficient to recover their curvature radii for a rotation around any given axis. Three 
pictures are in principle sufficient for approximating the radii of curvature for most 
contour points, and five can be used to estimate the components r x ,r y independently. 

The implemented scheme was found to give accurate results for large transforma¬ 
tions. In the scheme we have proposed, each object is represented by a number of 
models, each covering a range of potential viewpoints. The results suggest that only a 
small number of such models are required to predict the new appearance of an object 
from any viewpoint. 


Appendix 


Consider a surface defined by the implicit function F(x,y,z ) = 0. Assuming an ortho¬ 
graphic projection, where Z is the visual axis, the rim is defined by the set of points 
on the surface where F z {x, y, z) = 0. Let po = (xo,yo,z 0 ) be a rim point, that is 
^(Po) = -F 2 (po) = 0. We consider the cases where F has second derivatives at po, the 
surface normal there is not zero (that is either i^po) ^ 0 or I^(po) 0). In case one 
of them is zero we redefine the axes such that both .F*(po) and F’y(po) will be nonzero. 
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(a) 


(b) 



Figure 5: The iamge objects, (a) 
A VW car rotated by +15° with 
respect to the model, (d) A Saab 


A VW car rotated by -15° with respect to the model, (b) 
respect to the model, (c) A Saab car rotated by -15° with 
car rotated by +15° with respect to the model. 
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(a) 


(b) 



W (d) 

Figure 6: Deformation of the images with respect to the models, (a) A deformation of the VW 
car following a rotation of -15°. (b) A deformation of the VW car following a rotation of+15°. 
(c) A deformation of the Saab car following a rotation of —15°. (d) A deformation of the Saab 
car following a rotation of +15°. 
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(a) 


0 >) 



W (d) 

Figure 7: Correct alignment of the models with the images, (a) Alignment of the VW model 
with the first VW image, (b) Alignment of the same VW model with the second VW image, 
(c) Alignment of the Saab model with the first Saab image, (d) Alignment of the mmA Saab 
model with the second Saab image. 
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M (b) 




Figure 8: Matching the images with incorrect models, (a) Matching the first VW image to 
the Saab model, (b) Matching the second VW image to the same Saab model, (c) Matching 
the first Saab image to the VW model, (d) Matching the second Saab image to the same VW 
model. 
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Moreover we assume that F„(po) 7 ^ 0 . By this we ignore points where a continuous 
line yields a single point in the silhouette, and points with infinite radius of curvature. 
These points may change their place unexpectedly during rotation. 


Lemma 1 : Let F{x, y,z ) = 0 be a surface description, and let po = (x 0 , yo, z 0 ) be 
a rim point, i.e. ^(po) = -f^(po) = 0. The curvature radii of po with respect to the Y 
and X axes are given by: 


r x 

r v 



Proof: Consider the space curve defined by the implicit function F(x,y 0 ,z) = 0 . 

According to the implicit function theorem, since ^(po) # 0 and F**(po) 7 ^ 0 , x(z) is 
a well defined function in a neighborhood of po, and: 


~ ^zx + F zx (—) — —(F ZZ F X — F ZX F Z ) 


dz 

d_ 

dz 


— F xz + — -p-{F xz F x — F XX F Z ) 


And since: 


We obtain: 


dz 2 dz'dz dz ' F x F 2 


d 2 x -F zz Fl + F ZX F X F Z + F XZ F X F Z - F XX F 2 


_ 1 W 1 X ' X* Z I ZZ+ ** z XX* z 

dIz 2 F* 


F z (po) = 0, therefore: 


dx 

dz 


{zo) = 0 


—( z i = -llL 
dz 2 F x 

For a curve x(z), the radius of curvature at z 0 is given by: 



(1 + g(*o)) 3/ * 
£*(*>) 


Substituting the appropriate terms we obtain: 


?X 
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And in a similar way - 


r 


v 



Proposition 1: Let F(x,y,z) = 0 be a surface description, and let po be a rim 
point, i.e. F(po) = i^(po) = 0. Let V a be an axis lying in the image plane and forming 
an angle a with the positive X-axis. The radius of curvature at po with respect to V a 
is given by: 

r a = r v cos a — r x sin a 

Proof: Let G(x',y',z) = 0 be the surface F(x,y,z) — 0 rotated around .Z-axis by 

the angle —a, i.e. 


G(xy', z ) = F(x' cos a — y' sin a, x' sin a + y' cos a, z) 
After such a rotation V a coincides with X-axis, therefore: 


<r F — r G 
*at r y' 


Where r F , r G are radii of curvature for the surfaces F, G respectively. According to 
lemma 1: 


g Gy> 
r ' = -G~ 


zz 


Since: 


Gyt = —F z sin a + F y cos a 
G z = F z = 0 
G zz = F„ 


We obtain: 


r a = 


—Fy cos a + F x sin a 


— v 


= r y cos a — r x sin a 


Proposition 2: Let F(x,y,z) = 0 be a surface description, and let po be a rim 
point, i.e. F(po) = F z ( po) = 0. Let r = (r z ,r v ) be the curvature vector at po, and let 
t be the tangent vector to the silhouette at p 0 . Then r • t = 0 that is, r JL t. 

Proof: The point p 0 satisfies the two constraints F(po) = 0 and i^(po) = 0. Ac¬ 

cording to the implicit function theorem, since -F„(p 0 ) ^ 0, -F«(p 0 ) / 0, y(x), z(x) are 
well defined functions in a neighborhood of po. The tangent vector t to y(x) is in the 
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direction (l,|jf) in the XY plane, and since g t is the direction (~F y ,F x ). 

According to Lemma 1, the vector of curvature radii is*given by: 


Therefore: 


r — ( r *> r v) — ( 




r -1 = 


F F 


F X F V 


= 0 
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